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Abstract. Functional brain imaging is a source of spatio-temporal data 
mining problems. A new framework hybridizing multi-objective and multi- 
modal optimization is proposed to formalize these data mining problems, 
and addressed through Evolutionary Computation (EC). 
The merits of EC for spatio-temporal data mining are demonstrated as 
the approach facilitates the modelling of the experts' requirements, and 
flexibly accommodates their changing goals. 

1 Introduction 

Functional brain imaging aims at understanding the mechanisms of cognitive 
processes through non-invasive technologies such as magnetoencephalography 
(MEG). These technologies measure the surface activity of the brain with a good 
spatial and temporal resolution |8ll5j . generating massive amounts of data. 

Finding "interesting" patterns in these data, e.g. assemblies of active neu- 
ronal cells, can be viewed as a Machine Learning or a Data Mining problem. 
However, contrasting with ML or DM applications [BJ, the appropriate search 
criteria are not formally defined up to now; in practice the detection of active 
cell assemblies is manually done. 

Resuming an earlier work [17j , this paper formalizes functional brain imaging 
as a multi-objective multi-modal optimization (MoMOO) problem, and describes 
the evolutionary algorithm called J^D-Miner devised to tackle this problem. In 
this paper, the approach is extended to the search of discriminant patterns; ad- 
ditional criteria are devised and accommodated in order to find patterns specif- 
ically related to particular cognitive activities. 

The paper is organized as follows. Section [2] introduces the background and 
notations; it describes the targeted spatio-temporal patterns (STP) and formal- 
izes the MoMOO framework proposed. Section [3] describes the J^D-Miner algo- 
rithm designed for finding STPs, hybridizing multi-objective [5] and multi-modal 
[T!2] heuristics, and it reports on its experimental validation. Section 2] presents 
the extension of 4D-Miner to a new goal, the search for discriminant STPs. Sec- 
tion [5] discusses the opportunities offered by Evolutionary Data Mining, and the 
paper concludes with perspectives for further research. 



2 Background and Notations 



This section introduces the notations and criteria for Data Mining in functional 
brain imaging, assuming the reader's familiarity with multi-objective optimiza- 
tion [5]. Let N be the number of sensors and let T denote the number of 
time steps. The i-th sensor is characterized by its position Mi on the skull 
(Mi = (xi,yi,Zi) E R 3 ) and its activity Cj(t), 1 < t < T along the experiment. 
Fig. [T] depicts a set of activity curves. 
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Fig. 1. Magneto-Encephalography Data (N = 151, T = 875) 



A spatio-temporal pattern noted X — (I, i, w, r) is characterized from its 
temporal interval I (I = \t\,t2\ C and a spatial region B(i,w,r). For 

the sake of convenience, spatial regions are restricted to axis-parallel ellipsoids 
centered on some sensor; region B(i,w,r) is the ellipsoid centered on the i-th 
sensor, which includes all sensors j such that d w (M il Mj) is less than radius 
r > 0, with 

d w (Mi, Mj) 2 = wi(xi - x,jf +W2(yi - Uj) 2 +w 3 (z l - zj) 2 wi,w 2 ,w 3 > 

This paper focuses on the detection of assemblies of active neuronal cells, 
informally viewed as large spatio-temporal regions with correlated sensor activ- 
ities. Formally, let / = [ti, be a time interval, and let C\ denote the average 
activity of the i-th sensor over I. The /-alignment ai(i,j) of sensors i and j over 
/ is defined as: 

Ttt^.c^) ( \c[-q\ 

To every spatio-temporal pattern X = (I,i,w,r), are thus associated i) its du- 
ration or length £(X) (= t% — ti); ii) its area a(X) (the number of sensors in 
B(i,w, r)); and iii) its alignment cr(X), defined as the average of ai(i,j) for j 
ranging in B(i, w, r). An interesting candidate pattern is one with large length, 
area and alignment. 



Naturally, the sensor alignment tends to decrease as a longer time interval 
or a larger spatial region are considered, everything else being equal; conversely, 
the alignment increases when the duration or the area decrease. It thus comes to 
characterize the STP detection problem as a multi-objective optimization prob- 
lem (MOO) [5], searching for large spatio-temporal regions X with correlated 
sensor activities, i.e. patterns X simultaneously maximizing criteria £(X), a(X) 
and <r(X). The best compromises among these criteria, referred to as Pareto 
front, are the solutions of the problem. 
Definition 1. (Pareto-domination) 

Let Ci, . . . , Cff denote K criteria to be simultaneously maximized on fl. X is said 
to Pareto- dominate X' if X improves on X' with respect to all criteria, and the 
improvement is strict for at least one criterion. The Pareto front includes all 
solutions which are not Pareto- dominated. 

However, the MOO setting fails to capture the true target patterns: The Pareto 
front defined from the above three criteria could be characterized and it does 
include a number of patterns; but all of these actually represent the same spatio- 
temporal region up to some slight variations of the time interval and the spatial 
region. This was found unsatisfactory as neuroscientists are actually interested 
in all active areas of the brain; X might be worth even though its alignment, du- 
ration and area are lower than that of X', provided that X and X' are situated 
in different regions of the brain. 

The above remark leads to extend multi-objective optimization goal in the 
spirit of multi- modal optimization [12j . Formally, a new optimization framework 
is defined, referred to as multi-modal multi- objective optimization (MoMOO). 
MoMOO uses a relaxed inclusion relationship, noted p-inclusion, to relax the 
Pareto domination relation. 
Definition 2. (p-inclusion) 

Let A and B be two subsets of a measurable set Q, and let p be a positive real 
number (p e [0, 1}). A is p-included in B iff \Af] B\ > p x \A\, where \A\ denotes 
the measure of set A. 

Definition 3. (multi-modal Pareto domination) 

Let X and Y denote two spatio-temporal patterns with respective supports Sup(X) 
and Sup{Y) (Sup{X), Sup(Y) C IT j. X p-mo-Pareto dominates Y iff the sup- 
port ofY is p-included in that of X, and X Pareto- dominates Y . 
Finally, the interesting STPs are all spatio-temporal patterns which are not p- 
mo-Pareto dominated. 

It must be emphasized that MoMOO differs from MOO with diversity en- 
forcing heuristics (see e.g., [3111] ): diversity-based heuristics in MOO aim at a 
better sampling of the Pareto front defined from fixed objectives; MoMOO is 
interested in a new Pareto front, including diversity as a new objective. 

3 4D-Miner 

This section describes the J^D-Miner algorithm designed for the detection of 
stable spatio-temporal patterns, and reports on its experimental validation. 



3.1 Overview of 4D-Miner 

Following [3], special care is devoted to the initialization step. In order to both 
favor the generation of relevant STPs and exclude the extremities of the Pareto 
front (patterns with insufficient alignment, or insignificant spatial or temporal 
amplitudes), every initial pattern X = (i, w, I, r) is generated after a constrained 
sampling mechanism: 

— Center i is uniformly drawn in [1, TV]; 

— Vector w is set to (1,1,1) (d w is initialized to the Euclidean distance); 

— Interval / = [ti,ia] is such that t± is drawn with uniform distribution in 
[1, T]; the length t 2 — t\ of Ij is drawn according to a Gaussian distribution 
J\f (mine, mine/ 10), where mine is a user-supplied length parameter. 

— Radius r is deterministically computed from a user-supplied threshold min a , 
corresponding to the minimal /-alignment desired. 

r = mink{d w (i, k) s.t. crj(i, k) > min a )} 

— Last, the spatial amplitude a(X) of individual X is required to be more than 
a user-supplied threshold min a ; otherwise, the individual is non admissible 
and it does not undergo mutation or crossover. 

The user-supplied mini, min„ and min a thus govern the proportion of admis- 
sible individuals in the initial population. The computational complexity of the 
initialization phase is 0(P x N x mine), where P is the population size, N is 
the number of measure points and mine is the average length of the intervals. 

The variation operators go as follows. From parent X = (i, w, I, r), mutation 
generates an offspring by one among the following operators: i) replacing center 
i with another sensor in B(i,w,r); ii) mutating w and r using self-adaptive 
Gaussian mutation; iii) incrementing or decrementing the bounds of interval /; 
iv) generating a brand new individual (using the initialization operator). 
The crossover operator is subjected to restricted mating (only sufficiently close 
patterns are allowed to mate); it proceeds by i) swapping the centers or ii) the 
ellipsoid coordinates of the two individuals, or iii) merging the time intervals. 

A steady state evolutionary scheme is considered. In each step, a single admis- 
sible parent individual is selected and it generates an offspring via mutation or 
crossover; the parent is selected using a Pareto archive-based selection [5], where 
the size of the Pareto archive is 10 times the population size. The offspring ei- 
ther replaces a non-admissible individual, or an individual selected after inverse 
Pareto archive-based selection. 

3.2 Experimental results 

This subsection reports on the experiments done using J^D-Miner on real-world 
datasets 3 , collected from subjects observing a moving ball. Each dataset involves 

3 Due to space limitations, the reader is referred to [17] for an extensive validation 
of 4D-Miner. The retrieval performances and scalability were assessed on artificial 



151 measure points and the number of time steps (milliseconds) is 875. As can be 
noted from Fig. [TJ the range of activities widely varies along time. The runtime 
on the available data is less than 20 seconds on PC Pentium 2.4 GHz. 

The parameters used in the experiments are as follows. The population size 
is P = 200; the stop criterion is based on the number of fitness evaluations per 
run, limited to 40,000. A few preliminary runs were used to adjust the operator 
rates; the mutation and crossover rates are respectively set to .7 and .3. For 
computational efficiency, the p-inclusion is computed as: X is p-included in Y if 
the center i of X belongs to the spatial support of Y, and there is an overlap 
between their time intervals. J^D-Mintr is written in C ++ . 
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(a): t{X) = 8, a(X) = 8, a(X) = .29 (b): l(X) = 20, a(X) = 9, a(X) = .396 

Fig. 2. Two stable spatio-temporal patterns (N = 151, T = 875) 



Typical STPs found in the real datasets are shown in Fig. [H(a) and (b), 
displaying all activity curves belonging to the STP plus the time-window of the 
pattern. Both patterns are considered relevant by the expert; note that the STP 
on the right is Pareto dominated by the one of the left. 

All experiments confirm the importance of the user-defined thresholds (mine, 
min a , min a ), defining the minimum requirements on solution individuals. Rais- 
ing the thresholds beyond certain values leads to poor final results as the opti- 
mization problem becomes over constrained; lowering the thresholds leads to a 
crowded Pareto archive, increasing the computational time and adversely affect- 
ing the quality of the final solutions. Indeed, the coarse tuning of the parameters 
can be achieved based on the desired proportion of admissible individuals in 
the initial population. However, the fine-tuning of the parameters could not be 
automatized up to now, and it still requires running ^D-Miner a few times. For 
this reason, the control of the computational cost (through the population size 
and number of generations) is of utmost importance. 



datasets, varying the number T of time steps and the number N of sensors up 
to 8,000 and 4,000 respectively; the corresponding computational runtime (over a 
456Mo dataset) is 5 minutes on PC-Pentium IV. 



4 Extension to Discriminant STPs 



After some active brain areas have been identified, the next task in the functional 
brain imaging agenda is to relate these areas to specific cognitive processes, 
using contrasted experimental settings. In this section, the catch versus no-catch 
experiment is considered; the subject sees a ball, which s/he must respectively 
catch (catch setting) or let go (no- catch setting). Cell assemblies that are found 
active in the catch setting and inactive in the no-catch one, are conjectured to 
relate to motor skills. 

More generally, the mining task becomes to find STPs that behave differently 
in a pair of (positive, negative) settings, referred to as discriminant STPs. The 
notations are modified as follows. To the z-th sensor are attached its activities 
in the positive and negative settings, respectively noted Cf(t) and C~ (t); its 
positions are similarly noted Mf and . 

The fact that the sensor position differs depending on the setting entails that 
the genotype of the sought patterns must be redesigned. An alternative would 
have been to specify the 3D coordinates of a pattern instead of centering the 
pattern on a sensor position. However, the spatial region of a pattern actually 
corresponds to a set of sensors; in other words it is a discrete entity. The use of 
a 3D (continuous) spatial genotype would thus require to redesign the spatial 
mutation operator, in order to ensure effective mutations. However, calibrating 
the continuous mutation operator and finding the right trade-off between inef- 
fective and disruptive modifications of the pattern position proved to be trickier 
than extending the genotype. 

Formally, the STP genotype noted X(i,j,I,w,r) now refers to a pair of 
sensors i, j, which are closest to each other across both settings 4 . The STP is 
assessed from: 

— its spatial amplitude a + (X) (resp. a~(X)) defined as the size of B + (i, w, r), 
including all sensors k such that d w (M^ , Al£) < r (resp. B~(j, w, r), includ- 
ing all sensors k such that d w (M~ , M^) < r)). 

— its spatio-temporal alignment a + (X) (respectively a~(X)), defined as the 
activity alignment of the sensors in B + (i,w,r) (resp. in B~(j,w,r)), over 
time interval /. 

The next step regards the formalization of the goal. Although neuroscientists 
have a clear idea of what a discriminant STP should look like, turning this idea 
into a set of operational requirements is by no way easy. Several formalizations 
were thus considered, modelling the search criteria in terms of new objectives 
(e.g. maximizing the difference between a + and er~) or in terms of constraints 
(\<j + (X) — a~(X)\ > minder)- The extension of the J^D-Miner system to accom- 
modate the new objectives and constraints was straightforward. 

The visual inspection of the results found along the various modellings led 
the neuroscientists to introduce a new feature noted d(X), the difference of the 
average activity in B + (i,w,r) and B~(j,w,r) over the time interval /. Finally, 



4 With j = arg min{d w (M+ , M k ),k = 1-iV}; i = arg min{d w (M+ ,M j ),k = 1..N}. 



the search goal was modelled as an additional constraint on the STPs, expressed 
as > mind where mind is a user-supplied threshold. 

Also, it was deemed neurophysiologically unlikely that a functional difference 
could occur in the early brain signals; only differences occurring after the motor 
program was completed by the subject, i.e. 200ms after the beginning of the 
experiment, are considered to be relevant. This requirement was expressed in a 
straightforward way, through a new constraint on admissible STPs, and directly 
at the initialization level (e.g., drawing t\ uniformly in [200, T], section [571]) . 

Figs. [3J (a) and [3) (b) show two discriminant patterns, that were found to 
be satisfactory by the neuroscientists. Indeed, this assessment of the results 
pertains to the field of data mining more than discriminant learning. It is worth 
mentioning that the little amount of data available in this study, plus the known 
variability of brain activity in the general case (between different persons and 
for a same person at different moments, see e.g. ITU]), does not permit to assess 
discriminant patterns (e.g. by splitting the data into training and test datasets, 
and evaluating the patterns extracted from the training set onto the test set). 
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Fig. 3. Two discriminant stable spatio-temporal patterns. Sensors display a pos- 
itive (resp. negative) activity in the catch (resp. nocatch) case. (N — 151, 
T = 2500) 



Overall, the extension of J^D-Miner to the search of discriminant STPs re- 
quired i) a small modification of the genotype and ii) the modelling of two 
additional constraints. An additional parameter was introduced, the minimum 
difference mind on the activity level, which was tuned by a few preliminary runs. 
Same parameters as in section [3~2l were used; the computational cost is less than 
25 seconds on PC-Pentium. 



5 State of the art and discussion 



The presented approach is concerned with finding specific patterns in databases 
describing spatial objects along time. 



Many approaches have been developed in signal processing and computer 
science to address such a goal, ranging from Fourier Transforms to Independent 
Component Analysis [7] and mixtures of models [2]. These approaches aim at 
particular pattern properties (e.g. independence, generativity) and/or focus on 
particular data characteristics (e.g. periodicity). 

Functional brain imaging however does not fall within the range of such 
wide spectrum methods, for two reasons. Firstly, the sought spatio-temporal 
patterns are not periodic, and not independent. Secondly, and most importantly, 
it appears useless to build a general model of the spatio-temporal activity, while 
the "interesting" activity actually corresponds to a minuscule fragment of the 
total activity — the proverbial needle in the haystack. 

In the field of spatio-temporal data mining (see [18116] for comprehensive 
surveys), typical applications such as remote sensing, environmental studies, or 
medical imaging, involve complete algorithms, achieving an exhaustive search or 
building a global model. The stress is put on the scalability of the approach. 

Spatio-temporal machine learning mostly focuses on clustering, outlier de- 
tection, denoising, and trend analysis. For instance, [5] used EM algorithms for 
non-parametric characterization of functional data (e.g. cyclone trajectories), 
with special care regarding the invariance of the models with respect to tempo- 
ral translations. The main limitation of such non-parametric models, including 
Markov Random Fields, is their computational complexity; therefore the use 
of randomized algorithms is attracting an increasing for sidestepped by using 
randomized search for model estimates. 

Many developments are targeted at efficient access primitives and/or complex 
data structures (see, e.g., [H]); another line of research is based on visual and 
interactive data mining (see, e.g., [9]), exploiting the unrivaled capacities of 
human eyes for spotting regularities in 2D-data. 

More generally, the presented approach can be discussed with respect to 
the generative versus discriminative dilemma in Machine Learning. Although 
the learning goal is most often one of discrimination, generative models often 
outperform discriminative approaches, particularly when considering low-level 
information, e.g. signals, images or videos (see e.g. [M]). The higher efficiency 
of generative models is frequently explained as they enable the modelling and 
exploitation of domain knowledge in a powerful and convenient way, ultimately 
reducing the search space by several orders of magnitude. 

In summary, generative ML extracts faithful models of the phenomenon at 
hand, taking advantage of whatever prior knowledge is available; these models 
can be used for discriminative purposes, though discrimination is not among the 
primary goals of generative ML. In opposition, discriminative ML focuses on the 
most discriminant hypotheses in the whole search space; it does not consider the 
relevance of a hypothesis with respect to the background knowledge per se. 

To some extent, the presented approach combines generative and discrimi- 
native ML. 4D-Miner was primarily devised with the extraction of interesting 
patterns in mind. The core task was to model the prior knowledge through 
relevance criteria, combining optimization objectives (describing the expert's 



preferences) and constraints (describing what is not interesting). The extraction 
of discriminant patterns from the relevant ones was relatively straightforward, 
based on the use of additional objectives and constraints. This suggests that ex- 
tracting discriminant patterns from relevant ones is much easier than searching 
discriminant patterns, and thereafter sorting them out to find the relevant ones. 

6 Conclusion and Perspectives 

This paper has proposed a stochastic approach for mining stable spatio-temporal 
patterns. Indeed, a very simple alternative would be to discretize the spatio- 
temporal domain and compute the correlation of the signals in each cell of the 
discretization grid. However, it is believed that the proposed approach presents 
several advantages compared to the brute force, discretization-based, alternative. 

Firstly, J^D-Mintr is a fast and frugal algorithm; its good performances and 
scalability have been successfully demonstrated on real-world problems and on 
large-sized artificial datasets [T7] . Secondly, data mining applications specifically 
involve two key steps, exemplified in this paper: i) understanding the expert's 
goals and requirements; ii) tuning the parameters involved in the specifications. 
With regard to both steps, the ability of Evolutionary Computation to work 
under bounded resources is a very significant advantage. Evolutionary algorithms 
intrinsically are any-time algorithms, allowing the user to check at a low cost 
whether the process can deliver useful results, and more generally enabling her 
to control the trade-off between the computational resources needed and the 
quality of the results. 

A main perspective for further research is to equip J^D-Miner with learn- 
ing abilities, facilitating the automatic acquisition of the constraints and mod- 
elling the expert's expectations. A first step would be to automatically adjust 
the thresholds involved in the constraints, based on the expert's feedback. Ul- 
timately, the goal is to design a truly user-centered mining system, combining 
advanced interactive optimization [13] , online learning [Tj and visual data mining 
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